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Abstract —A striking difference between brain-inspired nenro- 
morphic processors and current von Neumann processors archi¬ 
tectures is the way in which memory and processing is organized. 
As Information and Communication Technologies continue to 
address the need for increased computational power through 
the increase of cores within a digital processor, neuromorphic 
engineers and scientists can complement this need by building 
processor architectures where memory is distributed with the 
processing. In this paper we present a survey of brain-inspired 
processor architectures that support models of cortical networks 
and deep neural networks. These architectures range from serial 
clocked Implementations of multi-neuron systems to massively 
parallel asynchronous ones and from purely digital systems to 
mixed analog/digital systems which implement more biological- 
like models of neurons and synapses together with a suite of 
adaptation and learning mechanisms analogous to the ones found 
in biological nervous systems. We describe the advantages of the 
different approaches being pursued and present the challenges 
that need to be addressed for building artificial neural processing 
systems that can display the richness of behaviors seen in 
biological systems. 


I. Introduction 

Neuromorphic information processing systems consist of 
electronic circuits and devices built using design princi¬ 
ples that are based on those of biological nervous sys¬ 
tems [1], [2], [3], [4]. The circuits are typically designed us¬ 
ing mixed-mode analog/digital Complementary Metal-Oxide- 
Semiconductor (CMOS) transistors and fabricated using stan¬ 
dard Very Large Scale Integration (VLSI) processes. Similar 
to the biological systems that they model, neuromorphic sys¬ 
tems process information using energy-efficient asynchronous, 
event-driven, methods [5]. They are often adaptive, fault- 
tolerant, and can be flexibly configured to display complex 
behaviors by combining multiple instances of simpler ele¬ 
ments. The most striking difference between neuromorphic 
systems and conventional information processing systems is 
in their use of memory structures. While computing systems 
based on the classical von Neumann architecture have one 
or more central processing units physically separated from 
the main memory areas, both biological and artificial neural 
processing systems are characterized by co-localized memory 
and computation (see Fig. 1): the synapses of the neural 
network implement at the same time memory storage as well 
as complex non-linear operators used to perform collective 
and distributed computation. Given that memory-related con¬ 
straints, such as size, access latency and throughput, represent 
one of the major performance bottlenecks in conventional 
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Fig. 1: Memory hierarchies in brains and computers. In brains, 
(a) Neurons and synapses are the fundamental elements of both 
neural computation and memory formation. Multiple excita¬ 
tory and inhibitory neurons, embedded in recurrent canonical 
microcircuits, form basic computational primitives that can 
carry out state-dependent sensory processing and computation. 
Multiple clusters of recurrent networks are coupled together 
via long-distant connections to implement sensory fusion, in¬ 
ference, and symbolic manipulation. In computers, (b) Central 
Processing Units containing multiple cores are connected to 
both main memory and peripheral memory blocks. Each core 
comprises a micro-processor and local memory (e.g., local 
registers and Level-1 cache). All cores typically share access 
to another block of fast memory integrated on the same chip 
(e.g., Level-2 cache). The main memory block is the primary 
storage area, typically larger than the memory blocks inside 
the CPU, but requiring longer access times. The peripheral 
memory block requires even longer access rates, but can store 
significantly larger amounts of data. 


computing architectures [6], and given the clear ability of 
biological nervous systems to perform robust computation, 
using memory and computing elements that are slow, in¬ 
homogeneous, stochastic and faulty [7], [8], neuromorphic 
brain inspired computing paradigms offer an attractive solution 
for implementing alternative non von Neumann architectures, 
using advanced and emerging technologies. 

This neuromorphic engineering approach, originally pro¬ 
posed in the late eighties [9] and pursued throughout the 
nineties [2], [10], [11] and early 2000s [12], [13], [14] by 
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a small number of research labs worldwide, is now being 
adopted by an increasing number of both academic and 
industrial research groups. In particular, there have been 
many recent publications describing the use of new materials 
and nano-technologies for building nano-scale devices that 
can emulate some of the properties observed in biological 
synapses [15], [16], [17], [18], [19], [20]. At the network and 
system level remarkable brain-inspired electronic multi-neuron 
computing platforms have been developed to implement alter¬ 
native computing paradigms for solving pattern recognition 
and machine learning tasks [21], [22] and for speeding-up the 
simulation of computational neuroscience models [22], [23]. 
These latter approaches however are only loosely inspired by 
biological neural processing systems, and are constrained by 
both precision requirements (e.g., with digital circuits, to guar¬ 
antee bit-precise equivalence with software simulations, and 
with analog circuits, to implement as faithfully as possible the 
equations and specifications provided by the neuro-scientists), 
and bandwidth requirements (e.g., to speed up the simulations 
by two or three orders of magnitude, or to guarantee that all 
transmitted signals reach their destinations within some clock 
cycle duration). 

An alternative strategy is to forgo these constraints and 
emulate biology much more closely by developing new ma¬ 
terials and devices, and by designing electronic circuits that 
exploit their device physics to reproduce the bio-physics of 
real synapses, neurons, and other neural structures [9], [15], 
[24], [25], [4]. In CMOS, this can be achieved by using 
Field-Effect Transistors (FETs) operated in the analog “weak- 
inversion” or “sub-threshold” domain [12], which naturally 
exhibit exponential relationships in their transfer functions, 
similar for example, to the exponential dependencies observed 
in the conductance of Sodium and Potassium channels of 
biological neurons [26]. In the sub-threshold domain, the 
main mechanism of carrier transport is diffusion and many 
of the computational operators used in neural systems (such 
as exponentiation, thresholding, and amplification) can be 
implemented using circuits consisting of only a few transistors, 
sometimes only one. Therefore, sub-threshold analog circuits 
require far fewer transistors than digital for emulating certain 
properties of neural systems. However these circuits tend to be 
slow, inhomogeneous, and imprecise. To achieve fast, robust, 
and reliable information processing in neuromorphic systems 
designed following this approach, it is necessary to adopt 
computational strategies that are analogous to the ones found 
in nature: for fast processing, low latency, and quick response 
times these strategies include using massively parallel arrays 
of processing elements that are asynchronous, real-time, and 
data- or event-driven (e.g., by responding to or producing 
spikes). For robust and reliable processing, crucial strategies 
include both the co-localization of memory and computation, 
and the use of adaptation and plasticity mechanisms that 
endow the system with stabilizing and learning properties. 

In this paper we will present an overview of current 
approaches that implement memory and information pro¬ 
cessing in neuromorphic systems. The systems range from 
implementations of neural networks using conventional von 
Neumann architectures, to custom hardware implementations 


that have co-localized memory and computation elements, but 
which are only loosely inspired by biology, to neuromorphic 
architectures which implement biologically plausible neural 
dynamics and realistic plasticity mechanisms, merging both 
computation and memory storage within the same circuits. 
We will highlight the advantages and disadvantages of these 
approaches, pointing out which application domains are best 
suited to them, and describe the conditions where they can 
best exploit the properties of new materials and devices, 
such as oxide-based resistive memories and spin-Field Effect 
Transistors (spin-FETs). 

A. Application areas for neuromorphic systems 

Although the origin of the field of neuromorphic engi¬ 
neering can be traced back to the late ’80s, this field is 
still relatively young when considering the amount of man¬ 
power that has been invested in it. Therefore there are not 
many well-established products and applications in the market 
that exploit neuromorphic technology to its full potential yet. 
However, it has been argued that there are several areas in 
which neuromorphic systems offer significant advantages over 
conventional computers [13], [14], [27], [4], [5], such as 
that of sensory processing [28] or “autonomous systems”. An 
autonomous system can be a simple one, such as a sensory 
processing system based on environmental sensors or bio¬ 
sensors; or an intermediate-complexity one, such as a Brain- 
Machince Interface (BMI) making one or two bit decisions 
based on the real-time on-line processing of small numbers of 
signals, sensed continuously from the environment [29]; or a 
complex one, such as a humanoid robot making decisions and 
producing behaviors based on the outcome of sophisticated 
auditory or visual processing [30]. These types of autonomous 
systems can greatly benefit from the extremely compact and 
low-power features of the neuromorphic hardware technol¬ 
ogy [1] and can take advantage of the neural style of com¬ 
putation that the neuromorphic hardware substrate supports, 
to develop new computing paradigms that are better suited to 
unreliable sensory signals in uncontrolled environments. 

Another application domain where dedicated neural pro¬ 
cessing hardware systems are being used to complement or 
even to replace conventional computers is that of custom accel¬ 
erated simulation engines for large-scale neural modeling [22], 
[31], or of very large-scale spiking neural networks applied to 
machine learning problems [21]. Also in this application area, 
the low-power features of the dedicated neuromorphic hard¬ 
ware implementations typically outperform those of general 
purpose computing architectures used for the same purposes. 
In Section II, we will describe deep network architectures 
currently used for machine learning benchmarks and describe 
examples of neuromorphic hardware network implementations 
that have been proposed to replace conventional computers 
used in simulating these architectures. In Section III we will 
describe examples of hardware systems that have been pro¬ 
posed to implement efficient simulations of large-scale neural 
models. Section IV will present an overview of the adaptation, 
learning, and memory mechanisms adopted by the nervous 
system to carry out computation, and Section V will present 
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Fig. 2; Multi-layer neural networks, (a) Hierarchical convolu¬ 
tional network with feed-forward connections; (b) Deep neural 
network with all-to-all connections. Converging connections 
are only shown for two representative neurons (i.e., with large 
fan-in) in the bottom two layers, while an example of a neuron 
projecting to multiple targets (i.e., a neuron with large fan-out) 
is shown only in the second-to-last layer. Connections in (b) 
can be both feed-forward and feed-back. 


an example of a neuromorphic processor that implements such 
mechanisms and that can be used to endow autonomous be¬ 
having systems with learning abilities for adapting to changes 
in the environment and interacting with it in real-time. 

II. From CPUs TO DEEP NEURAL NETWORKS 
Conventional computers based on the von Neumann archi¬ 
tecture typically have one or more CPUs physically separated 
from the program and data memory elements (see Fig. lb). 
The CPUs access both data and program memory using the 
same shared resources. Since there is a limited throughput 
between the processors and the memory, and since processors 
speeds are much higher than memory access ones, CPUs spend 
most of their time idle. This famous von Neumann bottleneck 
problem [6] can be alleviated by adding hierarchical memory 
structures inside the CPUs to cache frequently used data, or 
by shifting the computational paradigm from serial to parallel. 
Driven mainly by performance improvement demands, we 
have been witnessing both strategies in recent years [32]. How¬ 
ever, if one takes into account energy consumption constraints, 
increasing the size or improving the performance of cache 
memory is not an option. The energy consumption of cache 
memory is linearly proportional to its size [33]. The alterna¬ 
tive strategy is therefore to increase the number of parallel 
computing elements in the system. Amdahl’s law [34] has 
often been used to evaluate the performance gains of parallel 
processing in multi-processor von Neumann architectures [35]. 
In [36] the authors demonstrate how performance gained from 
parallelism is more energy efficient than performance gained 
from advanced memory optimization and micro-architectural 
techniques, by deriving a generalized form of Amdahl’s law 
which takes into account communication, energy and delay 
factors. A first step toward the implementation of massively 
parallel neural processing systems is to explore the use of 
Graphical Processing Units (GPUs), which typically combine 
hundreds of parallel cores with high memory bandwidths. 
Indeed several neural simulation tools have been proposed for 
this purpose [37], [38], [39]. However, as demonstrated with 
the cost function derived from the generalized Amdahl’s law 


in [36], even conventional GPU architectures are not optimally 
suited to running these spiking neural network simulations, 
when energy consumption is factored in. This is why new cus¬ 
tom hardware accelerated solutions, with memory access and 
routing schemes optimized specifically for neural networks 
started to emerge. In particular, a new set of digital architec¬ 
tures have been proposed for implementing “deep” multi-layer 
neural networks including full custom CMOS solutions [40], 
[41], [42] and solutions based on Field Programmable Gate 
Array (FPGA) devices [43], [44]. 

A. Deep Networks 

Deep networks are neural networks composed of many 
layers of neurons (see Fig. 2). They are currently the network 
architecture of choice in the machine learning community for 
solving a wide range of classification problems, and have 
shown state-of-art performance in various benchmarks tasks 
such as digit recognition [45]. They include convolutional 
networks which are being explored intensively within the 
neuromorphic community for visual processing tasks [46]. 

B. Convolutional Networks 

Convolutional networks, consist of a multi-layer feed¬ 
forward network architecture in which neurons in one layer 
receive inputs from multiple neurons in the previous layer 
and produce an output which is a thresholded or sigmoidal 
function of the weighted sum of its inputs (see Fig. 2 (a)). 
The connectivity pattern between the units of one layer and the 
neuron of the subsequent layer, responsible for the weighted 
sum operation forms the convolution kernel. Each layer typ¬ 
ically has one or a small number of convolution kernels that 
map the activity of a set of neurons from one layer to the 
target neuron of the subsequent layer. These networks were 
originally inspired by the structure of the visual system in 
mammals [47], [48], and were used extensively for image 
processing and machine vision tasks [49], [50], [51]. They are 
typically implemented on CPUs and GPUs which consume a 
substantial amount of power. In recent years, alternate dedi¬ 
cated System-On-Chip (SOC) solutions and FPGA platforms 
have been used to implement these networks for increasing 
their performance while decreasing their power consumption. 
Two main approaches are being pursued, depending on the 
readout scheme of the vision sensor; frame-based approach, 
which uses inputs from conventional frame-based cameras, 
and an event-driven one, which uses inputs from event-driven 
retina-like vision sensors [52], [53], [54], [55]. 

1} Frame-Based Solution: An example of a method pro¬ 
posed for implementing scalable multi-layered synthetic vision 
systems based on a dataflow architecture such as the one 
shown in Fig. 3 is the “neuFlow” system [56]. The dataflow 
architecture of this system relies on a 2D grid of Processing 
Tiles (PTs) where each PT has a bank of operators such as a 
multiply, divide, add, subtract, and a max; a multiplexer based 
on-chip router; and a configurable memory mapper block. 
The architecture is designed to process large streams of data 
in parallel. It uses a Smart Direct Memory Access (DMA) 
block which interfaces with off-chip memory and provides 

















PROCEEDINGS OF THE IEEE, VOL. X, NO. X, JUNE 2015 


4 


Visualsensor Featuremaps(Ll) Featuremaps(L2) Classifier 



Fig. 3: Example architecture of a convolutional multi-layered 
network. 


asynchronous data transfers with priority management. The 
DMA can be configured to read or write a particular chunk 
of data, and sends its status to another block called the Flow- 
CPU. The Flow-CPU works as a central Control Unit that can 
reconfigure the computing grid and the Smart DMA at run¬ 
time. The configuration data from Flow-CPU placed on a Run¬ 
time Configuration Bus re-configures most aspects of the grid 
at runtime. A full-blown compiler, dubbed “LuaFlow”, takes 
sequential tree-like or flow-graph descriptions of algorithms 
and parses them to extract different levels of parallelism. With 
the implementation of this architecture on a Xilinx Viitex 
6 ML605 FPGA, the authors demonstrate the segmentation 
and classification of a street scene using a 4 layered network 
running at 12 frames/second. The same architecture was also 
implemented in a custom 45 nm Silicon on Insulator (SOI) 
process Application Specific Integrated Circuit (ASIC) chip, 
which was predicted to have, by software simulations, a peak 
performance of 320 Giga-Operations per Second (GOPS) with 
a 0.6 W power budget [42]. In comparison, the neuFlow 
architecture implemented on a standard Xilinx Virtex 6 ML605 
FPGA has a peak performance of 16 GOPS with 10 W of 
power consumption. 

To cope with networks of larger numbers of layers and 
with their related memory bandwidth requirements, a scalable 
low power system called “nn-X” was presented in [44], [57]. 
It comprises a host processor, a coprocessor and external 
memory. The coprocessor includes an array of processing ele¬ 
ments called collections, a memory router, and a configuration 
bus. Each collection contains a convolution engine, a pooling 
module, and a nonlinear operator. The memory router routes 
the independent data stream to the collections and allows nn- 
X to have access to multiple memory buffers at the same 
time. The nn-X system as prototyped on the Xilinx ZC706 
platform, has eight collections, each with a lOx 10 convolution 
engine, and has a measured performance of 200 GOPS while 
consuming 4 W. 

2) Event-Driven Solution: The implementation of event- or 
spike-based convolutional network chips was first investigated 
in the spike-based multi-chip neuromorphic vision system 
"CAVIAR"* [58]. This system consists of a front-end retina 
chip, a set of spiking convolution chips, a winner-take-all chip, 

’The CAVIAR acronym stands for “Convolution Address-Event Repre¬ 
sentation (AER) Vision Architecture for Real-Time”. This was a four year 
project funded by the European Union under the FP5-IST program in 
June 2002, within the “Lifelike perception systems” subprogram. Its main 
objective was to develop a bio-inspired multi-chip multi-layer hierarchical 
sensing/processing/actuation system where the chips communicate using an 
AER infrastructure. 


and a set of learning chips. While the first convolutions chips 
in CAVIAR were designed using mixed-signal analog/digital 
circuits, a custom digital version with an array of 32x32 pixels 
was later implemented in a standard 0.35 /rm CMOS process, 
containing an arbitrary programmable kernel size of up to 
32x32 [59]. Extending these chips to implement large-scale 
networks would require an infrastructure for routing spikes 
between multiple convolutional chips. In CAVIAR this was 
done using an asynchronous communication protocol based 
on the Address-Event Representation (AER) and a set of 
multi-purpose FPGA routing boards. Currently convolutional 
networks are both implemented on full custom ASIC plat¬ 
forms [40], and on FPGA platforms [41]. The latest FPGA 
implementation is done on a Xilinx Virtex-6, and supports up 
to 64 parallel convolutional modules of size 64 x 64 pixels [60]. 
Here, memory is used to store the states of the pixels of the 
64 parallel modules, and the 64 convolutional kernels of up to 
size 11x11. When a new event arrives, the kernel of each con¬ 
volution module is projected around a pixel and a maximum 
of 121 updates are needed to determine the current state of 
the convolution. The new state is compared to a threshold to 
determine if an output spike should be generated, and its value 
is updated in memory. The event-driven convolutional modules 
have different memory requirements from the frame-based 
networks. Since the events arrive asynchronously, the states of 
the convolutional modules need to be stored in memory all the 
time. However, since events are processed in sequence, only a 
single computational adder block is needed, for computing the 
convolution of the active pixel. A four-layered network which 
recognizes a small subset of digits was demonstrated using 
this implementation [60]. Scaling up the network further will 
require more logic gate resources or a custom digital platform 
which could support a much larger number of units, such as 
the one described in Section III-A. 

C. Deep Belief Networks 

Deep Belief Networks (DBNs), first introduced by Hinton 
and colleagues [61] are a special class of deep neural networks 
with generative properties. They are composed of intercon¬ 
nected pairs of Restricted Boltzmann Machines (RBMs, see 
Fig. 2 (b)). They have also been used in a variety of bench¬ 
marking tasks [38]. An adaptation of the neural model to 
allow transfer of parameters to a 784-500-500-10 layer spiking 
DBN was described in [27] with good performance on the 
MNIST digit database [62]. This network architecture has been 
implemented on a Xilinx Spartan-6 LX150 FPGA [43] with 
very similar classification performance results (92%) on the 
same MNIST database. This FPGA implementation of the 
DBN (also called Minitaur) contains 32 parallel cores and 
128 MB of DDR2 as main memory (see Fig. 4). Each core 
has 2048 B of state cache, 8192B of weight cache, and 2 
Digital Signal Processors (DSPs) for performing fixed-point 
math. Because of the typical all-to-all connection from neurons 
of one layer to the next projection layer, memory for storing 
the weights of these connections is critical. The cache locality 
of each of the 32 cores is critical to optimizing the neuron 
weights and the state lookups. The connection rule lookup 
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Fig. 4; Simplified architecture of the Minitaur system. Each 
core has 2048 B of state cache, 8192B of weight cache, and 
2 DSPs; one for multiplying the decay, one for summation 
of the input current. The darker gray blocks indicate use of 
BRAMs. Events can be streamed from the computer via the 
input buffer and added to the event queue which also holds the 
events from the output neurons in the Minitaur Core. These 
events are then streamed out through the output buffer or to 
the neurons in the Core using the Connection Manager block. 
Adapted from [43]. 


block in Eig. 4 specihes how an incoming spike is projected to 
a set of neurons and the connection manager block distributes 
the updates of the neurons using the 32 parallel cores. This 
system is scalable but with limitations imposed by the number 
of available logic gates. On the Spartan-6 used the system can 
support up to 65,536 integrate-and-hre neurons. 

The Minitaur system, as well as other deep network systems, 
operate by construction in a massively parallel way with each 
unit processing local data and using local memory resources. 
In order to efficiently map these architectures onto EPGAs, 
DSPs, or classical von Neumann systems, it is necessary to 
develop custom processing and memory optimized routing 
schemes that take into account these features [56], [63], [64], 
[43]. 


III. Large scale models of neural systems 

While dedicated implementations of convolutional and deep 
networks can be extremely useful for specihc application 
domains such as visual processing and pattern recognition, 
they do not offer a computational substrate that is general 
enough to model information processing in complex biological 
neural systems. To achieve this goal, it is necessary to develop 
spiking neural network architectures with some degree of 
flexibility, such as the possibility to conhgure the network 
connectivity, the network parameters, or even the models 
of the network’s constituent elements (e.g., the neurons and 
synapses). A common approach, that allows a high degree 
of flexibility, and that is closely related to the ones used for 
implementing convolutional and deep networks, is to imple¬ 
ment generic spiking neural network architectures using off- 
the-shelf EPGA devices [64], [65], [66]. Such devices can be 
extremely useful for relatively rapid prototyping and testing of 
neural model characteristics because of their programmability. 
However, these devices, developed to implement conventional 
logic architectures with small numbers of input (fan-in) and 


output (fan-out) ports, do not allow designers to make dramatic 
changes to the system’s memory structure, leaving the von 
Neumann bottleneck problem largely unsolved. The next level 
of complexity in the quest of implementing brain-like neural 
information processing systems is to design custom ASICs us¬ 
ing a standard digital design flow [67]. Further customization 
can be done by combining standard design digital design flow 
for the processing elements, and custom asynchronous routing 
circuits for the communication infrastructure. 

A. SpiNNaker 

This is the current approach followed by the SpiNNaker" 
project [22]. The SpiNNaker system is a multi-core computer 
designed with the goal of simulating the behavior of up 
to a billion neurons in real time. It is planned to integrate 
57,600 custom VLSI chips, interfaced among each other via a 
dedicated global asynchronous communication infrastructure 
based on the AER communication protocol [68], [69], [70] 
that supports large fan-in and large fan-out connections, and 
that has been optimized to carry very large numbers of small 
packets (e.g. representing neuron spikes) in real-time. Each 
SpiNNaker chip is a “System-in-Package” device that contains 
a VLSI die integrating 18 hxed-point Advanced RISC Machine 
(ARM) ARM968 cores together with the custom routing 
infrastructure circuits, and a 128 MB Dynamic Random Access 
Memory (DRAM) die. In addition to the off-chip DRAM, the 
chip integrates the router memory, consisting of a 1024x32 3- 
state Content Addressable Memory (CAM) and a 1024x24 bit 
Random Access Memory (RAM) module. Eurthermore, each 
ARM core within the chip comprises 64 KB of data memory 
and 32 KB of instruction memory. 

SpiNNaker represents a remarkable platform for fast simula¬ 
tions of large-scale neural computational models. It can imple¬ 
ment networks with arbitrary connectivity and a wide variety 
of neuron, synapse, and learning models (or other algorithms 
not necessarily related to neural networks). However, the more 
complex the models used, the fewer number of elements that 
can be simulated in the system. In addition this system is 
built using standard von Neumann computing blocks, such 
as the ARM cores in each chip. As a consequence it uses 
to a large extent the same memory hierarchies and structures 
found in conventional computers (as in Eig. lb), and does not 
provide a computing substrate that can solve the von Neumann 
bottleneck problem [6]. 

B. TrueNorth 

The recent implementation of a full custom spiking neural 
network ASIC by IBM named “TrueNorth”^ represents a rad¬ 
ical departure from classical von Neumann architectures [21]. 
Although the electronic circuits of TrueNorth use transistors 

^SpiNNaker is a contrived acronym derived from Spiking Neural Net¬ 
work Architecture. The SpiNNaker project is lead by Prof. Steve Furber at 
Manchester University. It stalled in 2005 and was initially funded by a UK 
government grant until early 2014. It is currently used as the “many-core” 
Neuromoiphic Computing Platform for the EU EET Flagship Human Brain 
Project. 

^The development of the TrueNorth IBM chip was funded by the US 
“SyNAPSE”DARPA program, stalling in November 2008. 
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as digital gates, they are fully asynchronous and communicate 
using event-driven methods. The overall architecture consists 
of 4096 cores of spiking neural networks integrated into a 
single CMOS chip. Each core comprises 256 digital leaky inte¬ 
grate and fire neuron circuits, 256x256 binary programmable 
synaptic connections, and asynchronous encoding, decoding 
and routing circuits. Synaptic events can be assigned one of 
three possible strengths (e.g., to model one type of inhibitory 
synapse and two excitatory ones with different weights), but 
they are instantaneous pulses with no temporal dynamics. The 
dynamics of the neurons is discretized into 1 ms time steps 
set by a global 1 kHz clock. Depending on the core’s synaptic 
matrix, the source neuron can target from one up to 256 
neurons of a destination core. These routing schemes are not 
as flexible as in the SpiNNaker system, but as opposed to 
SpiNNaker, this architecture distributes the system memory, 
consisting of the core synaptic matrix and the routing tables 
entries, across the whole network. The system is inherently 
parallel, distributed, modular, and (by construction) fault- 
tolerant. The cost of this very high parallelism however is 
relative density inefficiency; the chip fabricated using an 
advanced 28 nm CMOS process, occupies an area of 4.3 cm^, 
and all unused synapses in a given application represent “dark 
silicon” (silicon area occupied by unused circuits). Note that 
since also in biology space is precious real-estate, unused 
synapses are typically removed by a dynamic process (see 
structural plasticity in Section IV-A). In the TrueNorth chip 
the synapses do not implement any plasticity mechanism, 
so they cannot perform on-line learning or form memories. 
As a consequence, the goal of co-localizing memory and 
computation to mitigate the von Neumann bottleneck problem 
is only partially solved. 


C. NeuroGrid 

Similar to SpiNNaker and TrueNorth, the goal of the 
NeuroGrid^ system [25] is to implement large-scale neural 
models and to emulate their function in real-time. Unlike the 
previous two approaches NeuroGrid follows the original neu- 
romorphic engineering vision [1], [2] and uses analog/digital 
mixed-signal sub-threshold circuits to model continuous time 
neural processing elements. In particular, important synapse 
and neuron functions, such as exponentiation, thresholding, 
integration, and temporal dynamics are directly emulated using 
the physics of FETs biased in the sub-threshold regime [12]. 

NeuroGrid consists of a board with 16 standard CMOS 
“NeuroCore” chips connected in a tree network, with 
each NeuroCore consisting of a 256x256 array of two- 
compartmental neurons (see Fig. 5). Synapses are “shared” 
among the neurons by using the same synapse circuit for 
different spike sources. Multiple spikes can be superimposed 
in time onto a single synapse circuit, because it has been 
designed as a linear integrator hlter, and no non-linear effects 
are modeled. Each neuron in the array can target multiple 
destinations thanks to an asynchronous multi-cast tree routing 

^The NeuroGrid project wa.s developed by the group of Prof. Kwabena 
Boahen in Stanford, and was funded by the US NIH Pioneer Award granted 
to Boahen in 2006. 
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Fig. 5: NeuroCore chip block diagram (adapted from [25]). 
The chip comprises a 256x256 array of neuron elements, 
an asynchronous digital transmitter for sending the events 
generated by the neurons, a receiver block for accepting 
events from other sources, a router block for communicating 
packets among chips, and a memory blocks for supporting 
different network configurations. The neuron block comprises 
four different types of synapse analog circuits that integrate 
the incoming digital events into analog currents over time, 
four analog gating variable circuits that model the ion channel 
population dynamics, a soma circuit that generates the neu¬ 
ron output spikes, and a dendritic circuit that integrates the 
synaptic currents over space, from neighboring neurons. 


digital infrastructure. The number of target destinations that a 
neuron can reach is limited by the size of the memory used in 
external routing tables and by its access time [71]. However 
NeuroGrid increases the fan-out of each neuron by connecting 
neighboring neurons with local resistive networks or diffu- 
sors that model synaptic gap-junctions [72]. This structured 
synaptic organization is modeled after the layered organization 
of neurons within cortical columns [73]. The full NeuroGrid 
board therefore can implement models of cortical networks 
of up to one million neurons and billions of synaptic con¬ 
nections with sparse long range connections and dense local 
connectivity prohles. Like TrueNorth, NeuroGrid represents a 
radical departure from the classical von Neumann computing 
paradigm. Different memory structures are distributed across 
the network (e.g., in the form of routing tables, parameters, and 
state variables). The ability of the shared synapses to integrate 
incoming spikes reproducing biologically plausible dynamics 
provide the system with computational primitives that can 
hold and represent the system state for tens to hundreds 
of milliseconds. However, the design choice to use linear 
synapses in the system excluded the possibility to implement 
synaptic plasticity mechanisms at each synapse, and therefore 
the ability of NeuroGrid to model on-line learning or adaptive 
algorithms without the aid of additional external computing 
resources. 

NeuroGrid has been designed to implement cortical models 
of computation that run in real-time, and has been used in 
a closed-loop brain-machine application [74] and to control 
articulated robotic agents [75]. In this system time represents 
itself [9], and data is processed on the fly, as it arrives. Com¬ 
putation is data driven and signals are consumed as they are 
received. Unlike in conventional von Neumann architectures, 
time is not “virtualized”: signals are not time-stamped and 
there are no means to store the current state of processing or 
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to transfer time-stamped partial results of signal processing 
operations to external memory banks for later consumption. 
Memory and computation are expressed in the dynamics of 
the circuits, and in the way they are interconnected. So it is 
important that the system’s memory and computing resources 
have time-constants that are well matched to those of the 
signals they process. As the goal is to interact with the environ¬ 
ment and process natural signals with biological time-scales, 
these circuits use biologically realistic time constants which 
are extremely long (e.g., tens of milliseconds) if compared 
to the ones used in typical digital circuits. This long time- 
constants constraint is not easy to achieve using conventional 
analog VLSI design techniques. Achieving this goal, while 
minimizing the size of the circuits (to maximize density), 
is possible only if one uses extremely small currents, such 
as those produced by transistors biased in the sub-threshold 
domain [12], [4], as it is done in NeuroGrid. 

D. BrainScales 

Another approach for simulating large-scale neural models 
is the one being pursued in the BrainScales^ project [76]. 
BrainScales aims to implement a wafer-scale neural simulation 
platform, in which each 8 inch silicon wafer integrates 50 x 106 
plastic synapses and 200,000 biologically realistic neuron cir¬ 
cuits. The goal of this project is to build a custom mixed signal 
analog/digital simulation engine that can accurately implement 
the differential equations of the computational neuroscience 
models provided by neuro-scientists, and reproduce the re¬ 
sults obtained from numerical simulations run on standard 
computers as faithfully as possible. For this reason, in an 
attempt to improve the precision of the analog circuits, the 
BrainScales engineers chose to use the above-threshold, or 
strong-inversion, regime for implementing models of neurons 
and synapses. However, in order to maximize the number 
of processing elements in the wafer, they chose to imple¬ 
ment relatively small capacitors for modeling the synapse 
and neuron capacitances. As a consequence, given the large 
currents produced by the above-threshold circuit and the small 
capacitors, the BrainScales circuits cannot achieve the long 
time-constants required for interacting with the environment 
in real-time. Rather, their dynamics are “accelerated” with 
respect to typical biological times by a factor of 10^ or 
10^. This has the advantage of allowing very fast simulation 
times which can be useful e.g., to investigate the evolution 
of network dynamics over long periods of time, once all 
the simulation and network configuration parameters have 
been uploaded to the system. But it has the disadvantage of 
requiring very large bandwidths and fast digital, high-power, 
circuits for transmitting and routing the spikes across the 
network. 

^The BrainScales acronym stands for “Brain-inspired multiscale compu¬ 
tation in neuromorphic hybrid systems”. This project was funded by the 
European Union under the FP7-ICT program, and started in January 2011. It 
builds on the research carried out in the previous EU FACETS (Fast Analog 
Computing with Emergent Transient States) project, and is now part of the EU 
FET Flagship Human Brain Project (HBP). The mixed signal analog-digital 
system being developed in this project currently represents the “physical 
model” Neuromorphic Computing Platform of the HBP. 


Like NeuroGrid, the synaptic circuits in BrainScales express 
temporal dynamics, so they form memory elements that can 
store the state of the network (even if for few hundreds of 
micro-seconds). In addition, the BrainScales synapses com¬ 
prise also circuits endowed with spike-based plasticity mecha¬ 
nisms that allow the network to learn and form memories [77]. 
BrainScales therefore implements many of the principles that 
are needed to build brain inspired information processing 
systems that can replace or complement conventional von 
Neumann computing systems. But given the circuit design 
choices made for maximizing precision in reproducing nu¬ 
merical simulation results of given differential equations, the 
system is neither low power (e.g., when compared to the other 
large-scale neural processing systems previously described), 
nor compact. To build neural information processing systems 
that are at the same time compact, low-power, and robust, it 
will be necessary to follow an approach that can use extremely 
compact devices (such as nano-scale memristive synapse 
elements), very low-power circuit design approaches (e.g., 
with sub-threshold current-mode designs), and by adopting 
adaptation and learning techniques that can compensate for 
the variability and in-homogeneity present in the circuits at 
the system level. 

IV. Adaptation, learning, and working-memory 

Adaptation and learning mechanisms in neural systems are 
mediated by multiple forms of “plasticity”, which operate 
on a wide range of time-scales [78]. The most common 
forms are Structural Plasticity, Homeostatic Plasticity, Long 
Term Potentiation (LTP) and Long Term Depression (LTD) 
mechanisms, and Short-Term Plasticity (STP) mechanisms. 
While these mechanisms are related to the ability of single 
neurons and synapses to form memories, the term Working 
Memory is often used to refer to the ability of full networks 
of neurons to temporarily store and manipulate information. 
A common model that has been proposed to explain the 
neural basis of working memory is that based on “Attractor 
Networks” [79], [80], [81], [82]. In this section we give a 
brief overview of single neuron and network level mechanisms 
observed in biology that sub-serve the function of memory 
formation, and of the neuromorphic engineering approaches 
that have been followed to implement them in electronic 
circuits. Examples of analogies between the memory structures 
in conventional von Neumann architectures and the plasticity 
and memory mechanisms found in neural and neuromorphic 
systems are shown in Table 1. 

A. Plasticity 

Structural plasticity refers to the brain’s ability to make 
physical changes in its structure as a result of learning and 
experience [83], [84]. This mechanism, which typically oper¬ 
ates on very long time scales, ranging from minutes to days or 
more, is important for the formation and maintenance of long¬ 
term memories. Homeostatic plasticity is a self-stabilizing 
mechanism that is used to keep the activity of the neurons 
within proper operating bounds [85]. It is a process that 
typically operates on relatively long time scales, ranging from 
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Relative time scale Computers Brains and Neuromorphic Systems 


Short 

Fast local memory (e.g., registers and 
cache memory banks inside the CPU) 

Medium 

Main memory (e.g., dynamic RAM) 

Long 

Peripheral memory (e.g., hard-disks) 


Synapse and neural dynamics at the single neuron level (short-term 
plasticity, spike-frequency adaptation, leak, etc.) 

Spike-based plasticity (STOP, Hebbian learning, etc.) Formation of 
working memory circuits (e.g., recurrent networks, attractors, etc.) 
Structural plasticity, axonal growth, long-term changes in neural path¬ 
ways. 


TABLE I: Memory structures in computers versus memory structures in brains and neuromorphic systems. 


hundreds of milliseconds to hours. From the computational 
point of view, this mechanism plays a crucial role for adapting 
to the overall activity of the network, while controlling its 
stability. Short-term plasticity on the other hand is a process 
that typically operates on time scales that range from fractions 
of milliseconds to hundreds of milliseconds. It can manifest 
itself as both short-term facilitation or short-term depression, 
whereby the strength of a synapse connecting a pre-synaptic 
(source) neuron to a post-synaptic (destination) one is up- 
regulated or down-regulated respectively, with each spike [86]. 
It has been demonstrated that this mechanism can play a 
fundamental role in neural computation [81], [87], [88], e.g., 
for modulating the neuron’s sensitivity to its input signals. 
Finally, long-term plasticity is the mechanism responsible 
for producing long-lasting, activity dependent changes in the 
synaptic strength of individual synapses [89]. A popular class 
of FTP mechanisms that has been the subject of widespread 
interest within the neuroscience community [90], the neuro¬ 
morphic community [91], and more recently in the material 
science and nano-technology community [92], [20], is based 
on the Spike-Timing Dependent Plasticity (STOP) rule [93], 
[94]. In its simplest form, the relative timing between the 
pre- and post-synaptic spikes determines how to update the 
efficacy of a synapse. In more elaborate ones, other factors 
are taken into account, such as the average firing rate of the 
neurons [95], their analog state (e.g., the neuron’s membrane 
potential) [96], [97], and/or the current value of the synaptic 
weights [98], [99]. The time-scales involved in the STDP 
timing window for the single weight update are of the order 
of tens to hundreds of milliseconds. But the FTP and LTD 
changes induced in the synaptic weights last for much longer 
time scales, ranging from hours to weeks [100]. 


B. Attractor networks 

Mechanisms operating at the network level can also allow 
neural processing systems to form short-term memories, con¬ 
solidate long-term ones, and carry out non-linear processing 
functions such as selective amplihcation (e.g., to implement at¬ 
tention and decision making). An example of such a network- 
level mechanism is provided by “attractor networks”. These 
are networks of neurons that are recurrently connected via 
excitatory synapses, and that can settle into stable patterns of 
bring even after the external stimulus is removed. Different 
stimuli can elicit different stable patterns, which consist of 
specihc subsets of neurons firing at high rates. Each of 
the high-hring rate attractor states can represent a different 
memory [80]. To make an analogy with conventional logic 



common mode input 






Fig. 6: Soft Winner-Take-All network behaviors; linear (top 
row) and non-linear (bottom row). The horizontal axis of 
each trace represents the spatial location of the neurons in 
the network, while the vertical axis represents the neurons 
response amplitude. Figure adapted from [101]. 


structures, a small attractor network with two stable states 
would be equivalent to a flip-flop gate in CMOS. 

A particularly interesting class of attractor networks is the 
one of soft Winner-Take-All (sWTA) neural networks. In these 
networks, groups of neurons both cooperate and compete 
with each other. Cooperation takes place between groups of 
neurons spatially close to each other, while competition is typ¬ 
ically achieved through global recurrent patterns of inhibitory 
connections [102]. When stimulated by external inputs, the 
neurons excite their neighbors and the ones with highest 
response suppress all other neurons to win the competition. 
Thanks to these competition and cooperation mechanisms, the 
outputs of individual neurons depend on the activity of the 
whole network and not just on their individual inputs [103]. 
Depending on their parameters and input signals, sWTAs 
networks can perform both linear and complex non-linear 
operations [104] (see Fig. 6), and have been shown to posses 
powerful computational properties for tasks involving feature- 
extraction, signal restoration and pattern classification [105]. 
Given their structure and properties, they have been proposed 
as canonical microcircuits that can explain both the neuro¬ 
anatomy and the neuro-physiology data obtained from ex¬ 
periments in the mammalian cortex [106]. Such networks 
have also been linked to the Dynamic Neural Fields (DNFs) 
typically used to model behavior and cognition in autonomous 
agents [107], [108]. 
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C. Neuromorphic circuit implementations 

The neuromorphic engineering community has been build¬ 
ing physical models of sWTA networks [109], [110], [111], 
attractor networks [112], [113], and plasticity mechanisms [91] 
that cover the full range of temporal and spatial scales 
described in Section IV-A for many years. For example, 
several circuit solutions have been proposed to implement 
short-term plasticity dynamics, using different types of de¬ 
vices and following a wide range of design techniques [114], 
[115], [116], [117], [118], [119]; a large set of spike-based 
learning circuits have been proposed to model long-term 
plasticity [120], [121], [122], [123], [77], [124], [125], [126], 
[127], [128], [91]; multiple solutions have been proposed for 
implementing homeostatic plasticity mechanisms [129], [130]; 
impressive demonstrations have been made showing the prop¬ 
erties of VLSI attractor networks [112], [113], [23], [4]; while 
structural plasticity has been implemented both at the single 
chip level, with morphology learning mechanisms for dendritic 
trees [131] and at the system level, in multi-chip systems that 
transmit spikes using the AER protocol, by reprogramming or 
“evolving” the network connectivity routing tables stored in 
the digital communication infrastructure memory banks [132], 
[133]. While some of these principles and circuits have been 
adopted in the deep network implementations of Section II 
and in the large-scale neural network implementations of 
Section III, many of them still remain to be exploited, at 
the system and application level, for endowing neuromorphic 
systems with additional powerful computational primitives. 

V. A NEUROMORPHIC PROCESSOR 

An example of a recently proposed neuromorphic 
multi-neuron chip that integrates all of the mechanisms 
described in Section IV is the Reconfigurable On¬ 
line Learning Spiking Neuromorphic Processor 
(ROLLS neuromorphic processor)® [134]. This device 
implements a configurable spiking neural network using 
slow sub-threshold neuromorphic circuits that directly 
emulate the physics or real neurons and synapses, and 
fast asynchronous digital logic circuits that manage the 
event-based AER communication aspects as well as the 
properties of neural network. While the analog circuits 
faithfully reproduce the neural dynamics and the adaptive 
and learning properties of neural systems, the asynchronous 
digital circuits provide a flexible means to configure both 
parameters of the individual synapse and neuron elements in 
the chip, as well as the connectivity of the full network. The 
goal of the approach followed in designing this device was 
not to implement large numbers of neurons or large-scale 
neural networks, but to integrate many non-linear synapses 
for exploring their distributed memory and information 
processing capabilities. Although the analog circuits in 
the ROLLS neuromorphic processor are characterized by 
susceptibility to noise, variability and inhomogeneous 

®The ROLLS neuromorphic processor device was designed at the Institute 
of Neuroinformatics of the University of Zurich, ETH Zurich. Its development 
was funded by the EU ERC “Neuromorphic Processors” (NeuroP) project, 
awarded to Giacomo Indiveri in 2011. 



Eig. 7: ROLLS neuromorphic processor: micrograph of a neu¬ 
romorphic processor chip that allocates most of its area to 
non-linear synapse circuits for memory storage and distributed 
massively parallel computing. 


properties (mainly due to device mismatch) [134] the multiple 
types of plasticity mechanisms and the range of temporal 
dynamics present in these circuits endow the system with a 
set of collective and distributed computational operators that 
allow it to implement a wide range of robust signal processing 
and computing functions [134]. The device block diagram 
and chip micro-graph is depicted in Eig. 7. As evidenced 
from the chip micrograph in Eig. 7, most of the area of the 
device is dedicated to the the synapses, which represent both 
the site of memory and of computation. 

The chip, fabricated using a standard 6-metal 180 nm CMOS 
process, occupies an area of 5L4mm^ and has approxi¬ 
mately 12.2 million transistors. It comprises 256 neurons 
and 133,120 synapses, equivalent to 130 KB “memory” ele¬ 
ments. The synapse circuits are of three different types: linear 
time-multiplexed (shared) synapses, STP synapses, and LTP 
synapses. The linear synapses are subdivided into blocks of 
four excitatory and four inhibitory synapse integrator circuits 
per neuron, with shared sets of synaptic weights and time 
constants. The STP synapses are arranged in four arrays of 
128x128 elements. Each of these elements has both analog 
circuits, that can reproduce short-term adaptation dynamics, 
and digital circuits, that can set and change the programmable 
weights. The LTP synapses are subdivided into four arrays 
of 128x128 elements which contain both analog learning 
circuits, and digital state-holding logic. The learning circuits 
implement the stochastic plasticity STOP model proposed 
in [135] to update the synaptic weight upon the arrival of 
every pre-synaptic input spike. Depending on the analog value 
of the weight, the learning circuits also drive the weight to 
either a high LTP state, or a low LTD state on very long 
time scales (i.e., hundreds of milliseconds), for long-term 
stability and storage of the weights (see [134] for a through 
description and characterization of these circuits). The digital 
logic in the LTP synapse elements is used for configuring 
the network connectivity. The silicon neuron circuits on the 
right side of the layout of Pig. 7 implement a model of the 
adaptive exponential Integrate-and-Pire (I&P) neuron [136] 
that has been shown to be able to accurately reproduce 
electrophysiological recordings of real neurons [137], [138]. 
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Additional circuits are included in the l&F neuron section 
of Fig. 7 to drive the learning signals in the LTP arrays, 
and to implement self-tuning synaptic scaling homeostatic 
mechanism [139] on very long time scales (i.e., seconds to 
minutes) [130]. The currents produced by the synapse circuits 
in the STP and LTP arrays are integrated by two independent 
sets of low-power log-domain pulse integrator biters [140] 
that can reproduce synaptic dynamics with time constants that 
can range from fractions of micro-seconds to hundreds of 
milliseconds. The programmable digital latches in the synapse 
elements can be used to set the state of the available all-to-all 
network connections, therefore allowing the user to conbgure 
the system to implement arbitrary network topologies, with 
the available 256 on-chip neurons, ranging from multi-layer 
deep networks, to recurrently connected reservoirs, to winner- 
take-all networks, etc. All analog parameters of synapses and 
neurons can be conbgured via a temperature compensated 
programmable bias generator [141]. 

To demonstrate the neuromorphic processor’s memory and 
information processing abilities we trained the network to 
encode memory patterns into four different attractor networks, 
as those described in Section IV-B (see Fig. 8). The protocol 
followed to train the silicon neurons to form these associative 
memories is the following: we created a recurrent competitive 
network, by connecting the chip’s 256 neuron outputs to 
90% of their 256x256 plastic LTP synapses, via recurrent 
excitatory connections, and to a random subset of 50% of the 
non-plastic STP inhibitory synapses, via recurrent inhibitory 
connections; we initialized the LTP bi-stable plastic synapses 
to a low state with 90% probability, and to a high state with 
10% probability (see brst plot of Fig. 8a); we conbgured 
the parameters of the learning circuits to induce LTP and 
LTD transitions in a intermediate range of bring rates (e.g., 
between 50FIz and 150 Hz), and to stop changing the weights 
for higher frequencies (see [134] for a detailed description of 
the algorithm and circuits that implement these features); we 
then stimulated four separate groups of neurons repeatedly, by 
stimulating the non-plastic synapses with Poisson distributed 
input spike trains, for one second each (see Fig. 8b). With 
each stimulus presentation, the plastic synapses of the neurons 
receiving both feed-forward input from the Poisson input spike 
trains and recurrent feed-back from the output neurons tended 
to potentiate, while the plastic synapses of the neurons that 
did not receive feed-back spikes correlated with feed-forward 
inputs tended to depress (see second, third and fourth plot 
of Fig. 8a). As the number of potentiated plastic synapses 
increased, the populations of recurrently connected neurons 
started to produce sustained activity, with higher bring rates 
and more structured response properties (see second, third 
and fourth plot of Fig. 8a). The attractors are fully formed 
when enough recurrent connections potentiate, such that the 
bring rate of the stimulated population is high enough to 
stop the learning process and the population activity remains 
sustained even after the input stimulus is removed (e.g. see 
activity during t=l-2s s, t=3-4 s, t=5-6 s, and t=7-9 s in Fig. 8b). 
When neurons belonging to different attractors are stimulated, 
they suppress activity of all other attractors via the recurrent 
inhibitory connections. Figure 8c shows an example of a 


silicon neuron output trace measured at the beginning of the 
experiment, when no atbactors exist, in the middle, as the 
attractors are being formed, and at the end of the experiment, 
during sustained activity of a fully formed attractor. Note that 
although the analog circuits in the device have a high degree of 
variability (with a coefficient of variation of about 10% [134]), 
and that the learning process is stochastic [135], the attractors 
are formed reliably, and their population response is sustained 
robustly. Furthermore, the learning process is on-line and al¬ 
ways on. Thanks to the spike-based Perceptron-like features of 
the learning circuits [134], [135], the synapses stop increasing 
their weights once the attractors are fully formed, and can start 
to change again and adapt to the statistics of the input, should 
the input signals change. As discussed in Section IV-B, these 
attractor networks represent an extremely powerful computa¬ 
tional primitive that can be used to implement sophisticated 
neuromorphic processing modules. In previous work [101], [4] 
we showed how the neuron, synapse, and learning circuits of 
the ROLLS neuromorphic processor can be used to implement 
other types of powerful computational primitives, such as the 
sWTA network of Section IV. In [142] we showed how these 
primitives can be combined to synthesize fully autonomous 
neuromorphic cognitive agents able to carry out context- 
dependent decision making in an experiment analogous to the 
ones that are routinely done with primates to probe cognition. 
By properly debning the types of spiking neural networks in 
the ROLLS neuromorphic processor, and by properly setting 
its circuit parameters, it is therefore possible to build already 
now small scale embedded systems that can use its distributed 
memory resources to learn about the statistics of the input 
signals and of its internal state, while interacting with the 
environment in real time, and to provide state- and context- 
dependent information processing. 

VI. Emerging nano-technologies 

An additional resource for building complex brain-like 
cognitive computing systems that are compact and low-power 
is provided by the large range of emerging nano-scale de¬ 
vices that are being proposed to replace the functionality 
of larger and bulkier CMOS circuits currently deployed for 
modeling synapses and neurons. Recent research in nano¬ 
scale materials is revealing the possibility of using novel 
devices to emulate the behavior of real synapses in arti- 
bcial neural networks, and in particular to reproduce their 
learning and state-holding abilities. The general goal is to 
exploit the non-volatile memory properties of these devices 
and their ability to keep track of their state’s past dynamics 
to implement massively parallel arrays of nano-scale elements 
integrated into neuromorphic VLSI devices and systems. For 
example, in [143] we showed how it is possible to integrate 
memristive devices in CMOS synapse arrays of the type 
used in the ROLLS neuromorphic processor of Section V. 
A promising technology is the one of Resistive Random 
Access Memories (R-RAMs) [144], which exploit resistance 
switching phenomena [145], and are very attractive due to 
their compatibility with CMOS technology. The base element 
of a RRAM device is a two-terminal element with a top 
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Fig. 8: Forming stable attractors in the ROLLS neuromorphic processor (a) State of the bi-stable LTP synapses at the beginning, 
during, and at the end of the experiment; (b) Raster plots representing the response of the neurons, both during input stimulus 
presentation (for one second at t=0 s, 2 s, 4 s, 6 s), and after stimulus is removed; (c) Example of a measured neuron membrane 
potential at the beginning, during, and at the end of the training sessions, during stimulus presentation. 


electrode, a bottom one, and a thin film, sandwiched between 
the electrodes. By applying a voltage across the electrodes, 
the electrical conductivity of the thin film material can be 
reversibly changed, from a high conductive to a high resistive 
state and viceversa, and the corresponding conductance value 
can be stored for a long period. Several proposals have been 
made for leveraging basic nano-scale RRAM attributes in 
synapse circuits in neuromorphic architectures [15], [143], 
[18], [92]; many of these proposals do not use these devices as 
conventional RAM cells, but distribute them within and across 
the synapse circuits in the neuromorphic architectures. It has 
been shown that these RRAM-based neuromorphic approaches 
can potentially improve density and power consumption by at 
least a factor of 10, as compared with conventional CMOS 
implementations [24]. 


Other approaches that also store memory state as resistance, 
but that exhibit a range of different behaviors include Spin- 
Transfer Torque Magnetic Random Access Memories (STT- 
MRAMs) [146], [147], [148], ferroelectric devices [149], and 
phase change materials [150], [151], [152]. In general, oxide- 
based R-RAMs, STT-MRAMs and phase-change memories 
are under intense industrial development. Although, these 
technologies are currently difficult to access for non-industrial 
applications, basic research in this domain has very high poten¬ 
tial, because neuromorphic circuits can harness the interesting 
physics being discovered in these new devices to extend their 
applicability: in addition to developing nano-scale materials 
and devices that can emulate the biophysics of real synapses 
and neurons, this research can lead to understanding how to ex¬ 
ploit their complex switching dynamics to reproduce relevant 
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computational primitives, such as state-dependent conductance 
changes, multi-level stability and stochastic weight updates, 
for use in large-scale neural processing systems [143], [20]. 

VII. Discussion 

The array of possible neuromorphic computing platforms 
described in this work, illustrates the current approaches 
used in tackling the partitioning of memory and information 
processing blocks on these systems. Here we discuss about 
the advantages and disadvantages of the different approaches 
and summarize the features of the neuromorphic computing 
platforms presented. 

a) Large-scale simulation platforms: Neural network 
simulation frameworks based on C or Python [37], [153], [154] 
which can run on conventional CPUs and GPUs based sys¬ 
tems, offer the best flexibility and quickest development times 
for simulating large-scale spiking networks. These simulations 
however can still take a long time on conventional comput¬ 
ing platforms with the additional disadvantage of very high 
power consumption figures (e.g., up to tens of mega-Watts). 
Dedicated hardware solutions have been built to support fast 
simulations of neural models, and to reduce their power- 
consumption figures. Hardware platforms such as SpiNNaker, 
BrainScales, and NeuroGrid fall under this category. 

Questions still remain whether large-scale simulations 
are necessary to answer fundamental neuroscience ques¬ 
tions [155], [156]. It is also not clear whether the compromises 
and trade-offs made with these custom hardware implementa¬ 
tions will restrict the search of possible solutions to these fun¬ 
damental neuroscience questions or dissuade neuroscientists 
from using such platforms for their work. For example, the 
models of neurons and synapses implemented on Neurogrid 
and BrainScales are hard-wired and cannot be changed. On 
the SpiNNaker platform they are programmable, but there are 
other constraints imposed on the type of neurons, synapses 
and networks that can be simulated by the limited memory, 
the limited resolution, and by the fixed-point representation of 
the system. 

Another critical issue that affects all large-scale simulator 
platforms is the Input/Output (FO) bottleneck. Even if these 
hardware systems can simulate neural activity in real-time, or 
accelerated time (e.g., 1 ms of physical time simulated in 1 /rs), 
the time required to load the configuration and the parameters 
of a large-scale neural network can require minutes to hours: 
for example, even using the latest state-of-the-art technology 
transfer rates of 300 Gb/s (e.g., with 12x EDR InfiniBand 
links), the time required to configure a single simulation run 
of a network comprising 10® neurons with a fan-out of 1000, 
and a fan-in of 10000 synapses with 8-bit resolution weights 
would require at least 45 minutes. 

b) General purpose computing platforms: In addition 
to dedicated simulation engines for neuroscience studies, 
neuromorphic information processing systems have also been 
proposed as general purpose non von Neumann computing 
engines for solving practical application problems, such as 
pattern recognition or classification. Example platforms based 
on FPGA and ASIC designs using the standard logic design 


flow have been described in Section II. Systems designed 
using less conventional design techniques or emerging nano¬ 
scale technologies include the IBM TrueNorth system (see 
Section III-B for the former), and the memristor- and RRAM- 
based neuromorphic architectures (see Section VI for the 
latter). The TrueNorth system however does not implement 
learning and adaptation. Therefore it can only be used as a 
low-power neural computing engine once the values of the 
synaptic weights have been computed and uploaded to the 
network. The complex learning process that determines these 
synaptic weights is typically carried out on power-hungry 
standard- or super-computers. This rules out the possibility 
of using this and similar architectures in dynamic situations 
in which the system is required to adapt to the changes of the 
environment or of its input signals. Endowing these types of 
architectures with learning mechanisms, e.g., using memristive 
or RRAM-based devices, could lead to the development of non 
von Neumann computing platforms that are more adaptive and 
general purpose. The state of development of these adaptation 
and learning mechanisms and of the nano-scale memristive 
technologies however is still in its early stages, and the 
problems related to the control of learning dynamics, stability, 
and variability are still an active area of research [20], [157]. 

c) Small scale special purpose neuromorphic systems: 
Animal brains are not general purpose computing platforms. 
They are highly specialized structures that evolved to increase 
the chances of survival in hazardous environments with limited 
resources and varying conditions [158]. They represent an 
ideal computing technology for implementing robust feature 
extraction, pattern recognition, associative learning, sequence 
learning, planning, decision making, and ultimately for gener¬ 
ating behavior [159]. The original neuromorphic engineering 
approach [1], [2] proposed to develop and use electronic sys¬ 
tems precisely for this purpose: to build autonomous cognitive 
agents that produce behavior in response to multiple types of 
varying input signals and different internal states [142], [107]. 
As argued in Section I, the best way to reach this goal is to use 
electronic circuits biased in the sub-threshold regime [12], [4], 
and to directly emulate the properties of real neural systems 
by exploiting the physics of the Silicon medium. Examples of 
systems that follow this approach are the ones described in 
Sections III-A and V. The types of signals that these systems 
are optimally suited to process include multi-dimensional 
auditory and visual inputs, low-dimensional temperature and 
pressure signals, bio-signals measured in living tissue, or even 
real-time streaming digital bit strings, e.g., obtained from 
internet, Wi-Fi, or telecommunication data. Examples of appli¬ 
cation domains that could best exploit the properties of these 
neuromorphic systems include wearable personal assistants, 
co-processors in embedded/mobile devices, intelligent brain- 
machine interfaces for prosthetic devices, and sensory-motor 
processing units in autonomous robotic platforms. 

d) Memory and information processing: Classical von 
Neumann computing architectures face the von Neumann bot¬ 
tleneck problem [6]. We showed in Section II how current at¬ 
tempts to reduce this problem, e.g. by introducing cache mem¬ 
ory close to the CPU or by using general purpose GPUs, are 
not viable, if energy consumption is factored in [36]. We then 


PROCEEDINGS OF THE IEEE, VOL. X, NO. X, JUNE 2015 


13 


described dedicated FPGA and full custom ASIC architectures 
that carefully balance the use of memory and information pro¬ 
cessing resources for implementing deep networks [42], [57], 
[40] or large-scale computational neuroscience models [67]. 
While these dedicated architectures, still based on frames 
or graded (non-spiking) neural network models, represent an 
improvement over CPU and GPU approaches, the event-based 
architectures described in Sections II-Bl and II-C improve 
access to cache memory structures even further, because of 
their better use of locality in both space and time. 

Also the SpiNNaker system, described in Section III-A, ex¬ 
ploits event-based processing and communication to optimize 
the use of memory and computation resources: computation is 
carried out by the system’s parallel ARM cores, while memory 
resources have been carefully distributed within each core 
(e.g., for caching data and instructions), across in-package 
DRAM memory-chips (e.g., for storing program variables 
encoding network parameters) and in routing tables (e.g., for 
storing and implementing the network connectivity patterns). 
SpiNNaker and the event-based architectures of Sections II-Bl 
and II-C however still separate, to a large extent, the compu¬ 
tation from memory access, and implement them in physi¬ 
cally different circuits and modules. Conversely, TrueNorth, 
NeuroGrid, BrainScales and ROLLS neuromorphic processor 
architectures described in Sections III and V represent a 
radical departure from the classical von Neumann computer 
design style. In TrueNorth (Section III-B) for example, the 
synapses (i.e., memory elements) are physically adjacent to 
the neuron circuits (i.e., computation elements), and multiple 
neuro-synaptic cores are distributed across the chip surface. 
Synapses in this architecture are used as basic binary memory 
elements and computation is mostly relegated to the neuron 
circuits (memory and computation elements are distributed and 
physically close to each other, but not truly co-localized). As 
discussed in Section III-C, the NeuroGrid architecture follows 
a substantially different approach. Rather than implementing 
binary memory circuits in each synapse, it uses circuits with 
global shared parameters that emulate the temporal dynamics 
of real synapses with biologically realistic time constants. 
Computation is therefore carried out in both neurons and 
synapses, and the main memory requirements for program¬ 
ming and re-configuring the network’s topology and function 
are in the routing tables and in the shared global parameters. 
Since a large part of the network topology is hardwired, 
the brange of possible neural models and functions that can 
be best emulated by NeuroGrid is restricted (by design) to 
models of cortical structures (e.g. parts of visual cortex). 
The BrainScales project (Section III-D) on the other hand, 
aims to support the simulation of large-scale networks of 
arbitrary topology, and which include non-linear operations 
at the synapse level (e.g., such as spike-timing dependent 
plasticity). Therefore the memory structures that set the net¬ 
work parameters are truly distributed and co-located with the 
computational elements. These include floating gate devices 
that set neuron and synapse parameter values as well as Static 
Random Access Memory (SRAM) and Digital to Analog 
Converters (DACs) circuits that store the synaptic weights. 
The BrainScales memory elements that are used to store 


the network connectivity patterns, on the other hand, are 
distributed across multiple devices, and interfaced to very fast 
routing circuits designed following the conventional digital 
communication approach [31]. A compromise between the 
highly flexible re-configurable but energy consuming approach 
of BrainScales, and the ultra-low power but with restricted 
degree of configurability approach of NeuroGrid is the one fol¬ 
lowed with the ROLLS neuromorphic processor (Section V). 
In this system all memory resources, both for routing digital 
events and for storing synaptic and neural circuit parameters 
are tightly integrated with the synapse and neuron computing 
circuits. Since the memory of the events being processed 
is stored in the dynamics of the circuits, which have time 
constants that are well matched to the type of computation 
being carried out (see also the NeuroGrid real-time arguments 
on page 6), memory and computation are co-localized. The 
strategies used by the ROLLS neuromorphic processor for 
implementing multiple types of memory structures analogous 
to those used in conventional von Neumann architectures are 
the ones summarized in Table I. For example, we showed 
in Section V how to build associative memories, by training 
a network of plastic neurons to memorize different patterns 
in four different attractors. The long-term memory changes 
were made in the network’s synaptic weights, via the chip’s 
spike-timing based learning mechanisms. Bi-stable or short¬ 
term memory structures can be made using the learned net¬ 
work attractors, that represent state-holding elements which 
emulate working-memory structures in cortical circuits, and 
which can be employed for state-dependent computation, for 
example implementing neural analogs of Finite State Ma¬ 
chines (FSMs) [142], [160]. Alternative training protocols and 
network connectivity patterns can be used in the same chip 
to carry out different types of neural information processing 
tasks, such as binary classification, e.g. for image recognition 
tasks [134]. Admittedly, the ROLLS neuromorphic processor 
comprises fewer neurons than those implemented in the large- 
scale neural systems surveyed in Section III, so the problem 
of allocating on-chip memory resources for routing events 
and configuring different connectivity patterns is mitigated. To 
build more complex neuromorphic systems that can interact 
with the world in real-time and express cognitive abilities it 
will be necessary to consider more complex systems, which 
combine multiple neural information processing modules with 
diverse and specialized functionalities, very much like the 
cortex uses multiple areas with different sensory, motor, and 
cognitive functional specifications [161], [162], and which 
restrict the possible connectivity patterns to a subset that maxi¬ 
mizes functionality and minimizes routing memory usage, very 
much like cortex uses patchy connectivity patterns with sparse 
long-range connections and dense short-range ones [163], 
[164]. Within this context, the optimal neuromorphic processor 
would be a multi-core device in which each core could be 
implemented following different approaches, and in which the 
routing circuits and memory structures would be distributed 
(e.g., within and across cores), heterogeneous (e.g. using 
CAM, SRAM, and/or even memrsitive devices) and hierar¬ 
chical (e.g., with intra-core level-one routers, inter-core level- 
two routers, inter-chip level-three routers, etc.) [165]. The 
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systems surveyed in this paper represent sources of inspiration 
for choosing the design styles of the the neural processing 
modules in each core and the memory structures to use, for 
different application areas. 

VIII. Conclusions 

In this work we presented a survey of state-of-art neu- 
romorphic systems and their usability for supporting deep 
network models, cortical network models, and brain inspired 
cognitive architectures. We outlined the trade-offs that these 
systems face in terms of memory requirements, processing 
speed, bandwidth, and their ability to implement the dif¬ 
ferent types of computational primitives found in biological 
neural systems. We presented a mixed signal analog/digital 
neuromorphic processor and discussed how that system, as 
well as analogous ones being developed by the international 
research community, can be used to implement cognitive 
computing. Finally, in the discussion section, we highlighted 
the advantages and disadvantages of the different approaches 
being pursued, pointing out their strengths and weaknesses. 
In particular, we argued that while there currently is a range 
of very interesting and promising neuromorphic information 
processing systems available, they do not yet provide sub¬ 
stantial advantages over conventional computing architectures 
for large-scale simulations, nor are they complex enough to 
implement specialized small-scale cognitive agents that can 
interact autonomously in any environment. 

The tremendous progress in micro-electronics and nano¬ 
technologies in recent years has been paralleled by remarkable 
progress in both experimental and theoretical neuroscience. To 
obtain significant breakthroughs in neuromorphic computing 
systems that can demonstrate the features of biological systems 
such as robustness, learning abilities and possibly cognitive 
abilities, continuing research and development efforts are 
required in this interdisciplinary approach which involves neu¬ 
roscientists, computer scientists, technologists, and material 
scientists. This can be achieved by training a new generation 
of researchers with interdisciplinary skills and by encouraging 
the communities specializing on these different disciplines to 
work together as closely as possible, as is currently being done 
in several computational neuroscience academic institutions, 
such as, for example, the Institute of Neuroinformatics - 
University of Zurich and ETH Zurich, or at neuroscience and 
neuromorphic engineering workshops, such as the Telluride 
and CapoCaccia Neuromorphic Engineering Workshops [166], 
[167]. 
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